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Abstract 

In  this  paper,  we  describe  our  clinical  ques¬ 
tion  answering  system  developed  and  submit¬ 
ted  for  the  Text  Retrieval  Conference  (TREC 
2014)  Clinical  Decision  Support  (CDS)  track. 

The  task  for  this  track  was  to  retrieve  relevant 
biomedical  articles  to  answer  generic  clini¬ 
cal  questions  about  medical  case  reports.  As 
part  of  our  maiden  participation  in  TREC,  we 
submitted  a  single  run  using  a  hybrid  Natural 
Language  Processing  (NLP)-driven  approach 
to  accomplish  the  given  task.  Evaluation  re¬ 
sults  showed  that  our  clinical  question  answer¬ 
ing  system  achieved  the  best  scores  in  two  of 
eight  dual-judged  topics:  #5  and  27,  and  per¬ 
formed  relatively  better  compared  to  the  me¬ 
dian  scores  for  topics:  #13,  18,  19,  22,  and  23. 

1  Introduction 

The  TREC  2014  CDS  track1  aims  at  investigating 
techniques  to  improve  patient  care  through  provid¬ 
ing  pertinent  biomedical  information  related  to  med¬ 
ical  case  reports.  The  primary  motivation  for  such 
a  task  relies  on  the  use  case  where  a  clinician  can 
seek  relevant  research-based  evidence  on  how  best 
to  care  for  patients  at  the  point  of  care.  For  exam¬ 
ple,  the  clinician  may  require  specific  information 
on  the  patient’s  most  likely  diagnosis  given  a  list  of 
signs/symptoms,  the  most  essential  tests/procedures 
in  a  given  scenario,  and  the  most  effective  treat¬ 
ment  plan  given  a  diagnosis.  In  some  cases,  these 
types  of  information  can  be  obtained  from  published 
biomedical  literature  that  can  eventually  serve  as 
potential  clinical  evidence  to  support  patient  care. 

1  http://www.trec-cds.org/ 


However,  due  to  the  exponential  growth  of  publica¬ 
tions  in  the  biomedical  domain  over  the  years,  it  has 
become  nearly  impossible  to  manually  mine  such  a 
huge  volume  of  scientific  information  repositories  to 
find  the  most  relevant  and  up-to-date  details  for  a 
particular  clinical  scenario.  Intelligent  CDS  systems 
can  be  useful  to  overcome  this  difficulty  through 
automated  clinical  question  answering.  Hence,  the 
main  goal  of  the  TREC  2014  CDS  track  is  to  pro¬ 
mote  research  on  systems  that  can  satisfy  the  infor¬ 
mation  need  of  the  clinicians  by  retrieving  relevant 
biomedical  articles  to  answer  generic  clinical  ques¬ 
tions. 

The  proposed  task  for  this  track  was  to  retrieve 
a  ranked  list  of  the  top  1000  biomedical  articles 
that  can  answer  questions  related  to  multiple  cat¬ 
egories  of  clinical  information  needs.  In  particu¬ 
lar,  short  medical  case  reports  were  associated  with 
one  of  three  generic  clinical  questions:  “What  is 
the  patient’s  diagnosis?”,  “What  tests  should  the 
patient  receive?”,  and  “How  should  the  patient  be 
treated?”.  The  retrieved  articles  were  judged  in 
terms  of  their  relevance  to  the  corresponding  clini¬ 
cal  question  associated  with  a  given  case  report.  Our 
submission  for  the  CDS  track  uses  a  variety  of  NLP- 
based  techniques  to  address  the  clinical  questions 
provided.  We  present  a  description  of  our  approach, 
and  discuss  our  experimental  setup,  results  and  eval¬ 
uation  in  the  subsequent  sections. 

2  Description  of  Our  Approach 

Our  hybrid  NLP-driven  method  presents  a  combina¬ 
tion  of  syntactic,  semantic  and  filtering  processes  to¬ 
wards  extracting  relevant  biomedical  articles  corre¬ 
sponding  to  clinical  concepts  (diagnoses,  treatment 
and/or  test)  relevant  to  each  given  topic.  Our  overall 


Report  Documentation  Page 


Form  Approved 
OMB  No.  0704-0188 


Public  reporting  burden  for  the  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information, 
including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington 
VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  a  penalty  for  failing  to  comply  with  a  collection  of  information  if  it 
does  not  display  a  currently  valid  OMB  control  number. 


1.  REPORT  DATE 

NOV  2014 


2.  REPORT  TYPE 


4.  TITLE  AND  SUBTITLE 

A  Hybrid  Approach  to  Clinical  Question  Answering 


6.  AUTHOR(S) 


7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

Philips  Research  North  America, 345  Scarborough  Rd,Briarcliff 
Manor, NY, 10510 

9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 


3.  DATES  COVERED 

00-00-2014  to  00-00-2014 

5a.  CONTRACT  NUMBER 

5b.  GRANT  NUMBER 

5c.  PROGRAM  ELEMENT  NUMBER 

5d.  PROJECT  NUMBER 

5e.  TASK  NUMBER 

5f.  WORK  UNIT  NUMBER 

8.  PERFORMING  ORGANIZATION 
REPORT  NUMBER 


10.  SPONSOR/MONITOR'S  ACRONYM(S) 

11.  SPONSOR/MONITOR'S  REPORT 
NUMBER(S) 


12.  DISTRIBUTION/AVAILABILITY  STATEMENT 

Approved  for  public  release;  distribution  unlimited 

13.  SUPPLEMENTARY  NOTES 

presented  in  the  proceedings  of  the  Twenty-Third  Text  REtrieval  Conference  (TREC  2014)  held  in 
Gaithersburg,  Maryland,  November  19-21,  2014.  The  conference  was  co-sponsored  by  the  National 
Institute  of  Standards  and  Technology  (NIST)  and  the  Defense  Advanced  Research  Projects  Agency 
(DARPA). 

14.  ABSTRACT 

In  this  paper,  we  describe  our  clinical  question  answering  system  developed  and  submitted  for  the  Text 
Retrieval  Conference  (TREC  2014)  Clinical  Decision  Support  (CDS)  track.  The  task  for  this  track  was  to 
retrieve  relevant  biomedical  articles  to  answer  generic  clinical  questions  about  medical  case  reports.  As 
part  of  our  maiden  participation  in  TREC,  we  submitted  a  single  run  using  a  hybrid  Natural  Language 
Processing  (NLP)-driven  approach  to  accomplish  the  given  task.  Evaluation  results  showed  that  our 
clinical  question  answering  system  achieved  the  best  scores  in  two  of  eight  dual-judged  topics:  #5  and  27, 
and  performed  relatively  better  compared  to  the  median  scores  for  topics:  #13, 18, 19, 22,  and  23. 

15.  SUBJECT  TERMS 


16.  SECURITY  CLASSIFICATION  OF: 

17.  LIMITATION  OF 

18.  NUMBER 

19a.  NAME  OF 

ABSTRACT 

OF  PAGES 

RESPONSIBLE  PERSON 

a.  REPORT 

unclassified 

b.  ABSTRACT 

unclassified 

c.  THIS  PAGE 

unclassified 

Same  as 
Report  (SAR) 

4 

Standard  Form  298  (Rev.  8-98) 

Prescribed  by  ANSI  Std  Z39-18 


approach  centers  on  three  main  processes:  (i)  Top¬ 
ical  Keyword  Extraction:  extraction  of  ontology- 
based  topical  keywords  (e.g.  findings,  disorders, 
body  structures,  procedures,  tests,  and  treatments) 
along  with  demographic  information  from  the  given 
medical  case  reports  (i.e.,  topic  descriptions);  (ii) 
Knowledge-based  Clinical  Inferencing:  use  of  top¬ 
ical  keywords  as  queries  to  a  third-party  clinical 
knowledge  base  and  extraction  of  a  ranked  list  of 
inferred  diagnoses/tests/treatments  corresponding  to 
each  given  topic;  and,  (iii)  Biomedical  Literature 
Retrieval:  retrieval  and  ranking  of  pertinent  biomed¬ 
ical  articles  based  on  the  keywords,  concepts,  and 
the  ranked  list  of  inferred  diagnoses/tests/  treatments 
extracted  in  the  prior  steps. 

As  an  initial  step,  we  extract  topical  keywords 
from  the  topic  descriptions  and  map  the  keywords  to 
categories  represented  in  clinical  domain  ontologies 
(e.g.  findings,  disorders,  treatment  etc.),  in  addition 
to  retrieving  demographic  details  from  the  topic  de¬ 
scriptions.  The  use  of  clinical  domain  ontologies 
is  effective  in  this  step  as  they  have  been  imple¬ 
mented  to  promote  standard  clinical  vocabulary,  and 
arc  widely  used  to  semantically  categorize  clinical 
concepts,  and  facilitate  information  exchange  and 
interoperability  (Bodenreider,  2008;  Stenzhorn  et 
al.,  2008;  Garde  et  al.,  2007).  We  use  the  following 
clinical  domain  ontologies:  SNOMED  CT2  (Cor¬ 
net  and  de  Keizer,  2008)  for  diagnoses,  LOINC3  for 
tests,  and  RxNorm4  for  treatments. 

In  the  next  step,  we  utilize  the  topical  keywords 
as  queries  to  a  clinical  knowledge  base,  which  is 
derived  from  Wikipedia5  articles  (clinical  medicine 
category)  and  indexed  using  Elasticsearch6  technol¬ 
ogy.  This  step  aims  to  find  relationships  between 
topical  keywords  and  associated  clinical  concepts 
(diagnoses/disorders,  treatment  and  test)  within  a 
comprehensive  knowledge  base  for  the  purpose  of 
biomedical  evidence  retrieval.  Wikipedia  has  been 
successfully  used  as  a  knowledge  source  by  the  in¬ 
formation  extraction  community  over  the  last  few 
years  (Wu  and  Weld,  2010).  Clinical  concepts 
found  in  the  Wikipedia  articles  arc  filtered  using 

2http://www.ihtsdo.org/snomed-ct/ 

3  http://loinc.org/ 

4http://www.nlm.nih.gov/research/umls/rxnorm/ 

3  https  ://www.  wikipedia.org/ 

6http://www.elasticsearch.org/ 


various  criteria  e.g.,  location,  gender,  match  with 
topical  keywords,  etc.,  and  the  resulting  list  of 
Wikipedia  articles  with  relevant  clinical  concepts 
arc  mined  to  retrieve  a  ranked  list  of  inferred  diag¬ 
noses/tests/treatments  corresponding  to  each  given 
topic  description. 

In  the  final  step,  topical  keywords  and  the  corre¬ 
sponding  disorders/diagnoses,  tests,  and  treatments 
obtained  from  the  clinical  knowledge  base  are  used 
to  retrieve  candidate  biomedical  articles  by  search¬ 
ing  through  TREC-CDS  abstracts  of  PubMed  Cen¬ 
tral  articles.  Candidate  articles  arc  ranked  using 
multiple  weighting  algorithms  designed  to  address 
each  category  of  clinical  questions  (diagnosis,  test, 
and  treatment).  The  retrieved  biomedical  articles 
arc  further  filtered  by  location,  demographic  infor¬ 
mation  and  other  parameters  (e.g.  species)  towards 
improving  the  relevance  of  the  results.  The  final  list 
of  top  1000  biomedical  articles  arc  ordered  by  article 
publication  date  to  support  the  clinician’s  synthesis 
of  current  research  evidence  related  to  the  questions 
for  each  topic  description. 

3  Experimental  Setup 

3.1  Test  Data 

The  test  dataset  comprises  30  topics  divided  into 
three  question  types  as  mentioned  above.  The  given 
topic  descriptions  (or  topics)  arc  essentially  med¬ 
ical  case  narratives  that  describe  scenarios  related 
to  patient’s  medical  history,  signs/symptoms,  diag¬ 
noses,  tests,  and  treatments.  The  topics  are  provided 
in  two  versions  depending  on  the  depth  of  informa¬ 
tion.  Topic  “descriptions”  include  comprehensive 
descriptions  of  the  patient’s  situation  whereas  topic 
“summaries”  contain  the  most  important  informa¬ 
tion.  We  used  descriptions  for  our  experiments  in 
order  to  utilize  the  unfiltered  and  richer  context  of 
the  available  patient  information. 

3.2  Corpus 

The  document  collection  for  the  track  comes 
from  the  open  access  portion  of  PubMed  Central7 
(PMC),  a  freely  available  online  database  of  full-text 
biomedical  articles.  The  provided  collection  was  a 
snapshot  of  the  open  access  subset  and  consisted  of 
over  700,  000  biomedical  publications. 

7http://www.ncbi.nlm.nih.gov/pmc/ 
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Figure  1 :  infAP  scores  for  each  topic 


Figure  3:  R-prec  scores  for  each  topic 


Figure  2:  infNDCG  scores  for  each  topic 


Figure  4:  Prec(10)  scores  for  each  topic 


3.3  Evaluation  and  Analysis 

The  evaluation  of  the  CDS  track  was  conducted  us¬ 
ing  the  standard  TREC  evaluation  procedures  for  ad- 
hoc  information  retrieval  tasks  (Yilmaz  et  ah,  2008; 
Voorhees,  2014).  The  highest  ranked  biomedical  ar¬ 
ticles  were  sampled  and  judged  by  medical  domain 
experts  on  a  three-point  scale  of  0:  not  relevant,  1 : 
possibly  relevant,  and  2:  definitely  relevant  depend¬ 
ing  on  the  relevance  of  the  answer  to  the  associated 
question  type  about  a  given  case  report. 

Figure  1  to  Figure  4  show  the  overall  scores  of  our 
system  ( prnal )  across  all  the  topics  (categorized  into 
three  groups:  diagnosis,  test,  and  treatment)  as  com¬ 
pared  to  the  median  and  best  scores  across  all  the 
submitted  automatic  runs  for  the  following  evalua¬ 
tion  measures:  inferred  average  precision8  (infAP), 

8 Average  Precision  (AP)  is  a  measure  that  combines  preci¬ 
sion  and  recall  for  evaluating  systems  that  retrieve  a  ranked  list 
of  articles.  In  particular,  AP  is  the  mean  of  the  precision  scores 
after  each  relevant  article  is  retrieved. 


inferred  normalized  discounted  cumulative  gain9 
(infNDCG),  precision  at  R  where  R  is  the  number  of 
known  relevant  documents  (R-prec),  and  precision 
at  10  documents  (Prec  (10)).  The  two  inferred  mea¬ 
sures  are  used  to  provide  more  accurate  estimates  of 
a  system’s  performance  when  relevance  judgments 
are  incomplete  due  to  dynamic  and/or  larger  docu¬ 
ment  collections  (Yilmaz  and  Aslam,  2006;  Yilmaz 
et  al.,  2008).  All  the  evaluation  measures  used  for 
the  CDS  track  contribute  towards  providing  a  sound 
view  about  the  quality  of  a  system.  The  reported  re¬ 
sults  show  that  our  clinical  question  answering  sys¬ 
tem  mostly  performs  close  to  the  median  scores  for 
all  evaluation  measures. 

9 Discounted  Cumulative  Gain  (DCG)  measures  the  quality 
of  ranking  for  a  system  when  it  retrieves  a  ranked  list  of  results 
and  the  results  are  graded  with  relevance  judgment.  In  particu¬ 
lar,  DCG  computes  the  usefulness  of  an  article  based  on  its  rank 
in  the  retrieved  list.  Normalized  DCG  (NDCG)  is  computed  by 
using  the  maximum  possible  DCG  (calculated  by  sorting  the 
result  list  by  relevance)  as  the  normalization  factor. 


Analysis  of  these  results  also  demonstrates  that 
our  clinical  question  answering  system  has  achieved 
the  best  scores  in  two  of  eight  dual-judged  topics:  #5 
and  27,  and  performed  relatively  better  compared  to 
the  median  scores  for  topics:  #13,  18,  19,  22,  and 
23.  These  results  further  emphasize  the  overall  per¬ 
formance  of  our  system  in  terms  of  answering  the 
various  question  types  represented  in  the  topic  de¬ 
scriptions. 

4  Conclusion 

In  this  paper,  we  described  our  participation  in  the 
inaugural  TREC  2014  Clinical  Decision  Support 
Track.  Evaluation  results  showed  the  effectiveness 
of  our  clinical  question  answering  system.  Next 
steps  include  improving  the  system's  performance 
with  more  domain-specific  clinical  knowledge  bases 
along  with  more  NLP  algorithms  (e.g.,  paraphrasing 
and  textual  entailment)  for  better  clinical  reasoning 
and  question  answering. 
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